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THREE  ASPECTS  OF  THE  STATISTICS  OF  DIRECTIONS 

by 


Geoffrey  S.  Watson 
Princeton  University 

ABSTRACT 


This  paper  is  a  selection  of  topics  from  three  lectures  given 
to  the  21st  Summer  Research  Institute  of  the  Australian  Mathe¬ 
matical  Society.  Lecture  1  gave  scientific  problems  yielding 
data  which  are  unit  vectors--directions--in  two  and  three  dimen¬ 
sions.  Methods  of  displaying  and  summarizing  the  data  were 
illustrated.  Lecture  2  began  with  the  uniform  distribution  on  a 
sphere  of  unit  radius  in  q  dimensions,  then  non-uniform  dis¬ 
tributions  were  discussed,  especially  those  that  arise  in  certain 
stochastic  processes.  Lecture  3  was  devoted  to  a  summary  of 
statistical  inference  methods  and  concluded  with  some  remarks  on 
problems  of  greater  generality  suggested  by  our  subject. 


This  research  was  supported  in  part  by  a  contract  with  the  Office 
of  Naval  Research,  No.  N00014-79-C-0322  ,  awarded  to  the  Depart¬ 
ment  of  Statistics,  Princeton  University. 


1.  THE  STUDY  OF  ORIENTATION  DATA. 


Geology  and  geophysics  were  the  first  sciences  to  require  the  analysis 
of  orientation  data.  In  geology,  the  orientation  of  pebbles  and  bedding 
planes  and  other  bodies  gives  directions  with  and  without  sense.  The  latter 
(e.g.  a  normal  to  a  plane)  are  often  called  axial  directions  for  lack  of  a 
better  term.  The  orientation  of  crystals  leads  to  rotation  matrices.  Here 
we  will  only  consider  directions  i.e.  (signed)  unit  vectors.  They  were 
first  given  serious  study  when  Fisher  (1953)  wrote  a  paper  for  paleomagnetic 
workers.  This  was  this  writer's  initial  motivation.  A  survey  of  applica¬ 
tions  is  given  in  Watson  (1970).  Later  biologists  interested  in  the  homing 
directions  of  pidgeons  provided  two  dimensional  data  and  further  problems. 

As  will  be  seen,  the  study  of  di>*ctiona!  data  forces  us  to  modify 
the  methods  and,  more  interestingly,  the  concepts  which  statisticians  have 
long  used  for  analyzing  vector  data.  The  shift  from  observations  xeIRq 
to  observations  xeflq  ,  the  surface  of  the  unit  ball  in  IRq  requires  new 
ideas.  In  practice,  we  have  only  so  far  needed  methods  of  ftp  and  ^3  » 
the  circle  and  sphere. 

Given  n  data  points  Xj,...,xn  eJlq  ,  we  need  first  methods  for 
looking  at  the  data.  For  q=2  ,  the  points  may  be  marked  on  a  circle  since 
they  are  1-1  with  angles.  If  [0,2ir)  is  split  into  intervals,  a  fre¬ 
quency  distribution  is  obtained.  A  histogram  using  sectors  rather  than 
boxes  is  called  a  "rose  diagram"— the  radius  is  usually  proportional  to 
the  frequency  but  it  is  better  to  use  the  square  root  of  the  frequency. 

This  transformation  stabilizes  the  variance  which  makes  eye- inspection  for 
peaks  or  modes  easier.  In  simple  cases,  the  data  will  show  only  one  modal 
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or  "preferred"  direction  in  which  case  we  then  need  some  measure  of  "disper¬ 
sion"  about  this  direction.  With  larger  samples  we  might  wish  to  use  a 
non-parametric  density  estimator.  We  will  return  to  these  questions. 

To  see  spherical  data,  it  is  usually  only  practical  to  look  at  plane 
projections  of  the  points.  The  computer  allows  us  easily  to  rotate  the 
data— often  it  is  mainly  one  hemisphere.  It  is  then  natural  to  use  an  equal 
area  projection.  If  the  rotated  points  are  identified  with  their  spherical 
polar  angles  e(colatitude) ,  ^(longitude),  the  spherical  area  element 
sineded*))  should  equal  the  planar  area  element  pdpd<j>  .  Hence  we  find 
p=2|sine/2|  .  This  Lambert  projection  is  called  the  Schmidt  net  in  Geology. 

If  the  data  x,,...,x  seems  to  be  uni-modal,  it  is  natural  to 

n 

define  the  modal -direction  to  be  the  unit  vector  y  parallel  to  R=Ex.  , 

i  j 

the  vector  resultant  or  sum  of  the  data.  If  the  data  is  widely  dispersed, 
the  length  ||R||  of  R  will  be  much  smaller  than  n  .  If  it  is  tightly 
clustered  about  y  ,  ||Rj|  will  be  almost  n  .  Hence  n-||R|[  is  a  measure 
of  dispersion  of  the  data  set,  an  analogue  in  fact  of  the  reciprocal 
l(x--x)  for  the  familiar  case  when  the  x.  are  real  numbers.  So  we  may 

J  J 

expect  that  y  and  will  in  some  way  play  the  roles  of  the  familiar 

mean  and  variance.  Of  course  if  the  data  cluster  is  not  fairly  rotationally 
symmetric  about  y  ,  more  than  one  number  will  be  needed  to  describe  its 


dispersion. 

If  the  data  points  xj  are  regarded  as  unit  masses. 


1  n 

£  Ex.  is  their 
n  ^  J 


center  of  gravity.  As  we  have  just  seen,  the  position  of  this  point  within 


the  sphere  tells  us  something  about  the  distribution  of  the  data.  The 
moment  of  inertia  clearly  tells  us  something  more.  This  leads  us  to  suggest 
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the  computation  of 


M  -  Ixjxj 


where  xj  is  the  transpose  of  the  column  vector  .  For  the  value  of 

v'Mv=l(v'Xj)  will  change  as  the  unit  vector  v  varies.  The  stationary 

values  are  the  eigen  values  of  M  .  If*  for  example,  the  data  points  lie 

fairly  uniformly  around  a  great  circle,  one  eigen  value  will  be  much 

smaller  than  the  others  which  will  be  nearly  equal— for  there  is  a  v 

nearly  orthogonal  to  all  the  data  and  this  is  the  eigen  vector  associated 

with  the  minimum  eigenvalue.  The  reader  can  easily  see  what  would  be 

suggested  by  equal  eigen  values  or  one  dominating  eigen  value.  Note  that 
n  n 

trace  M=E  trace  x^.xj  =  Exjx^  =  n  . 

Thus  one  should  add  to  the  rotation  and  projection  program,  these 
trivial  computations  and  have  the  results  printed  out  below  the  pictures 
which  we  obtain  by  making  hard  copies  from  a  Textronics  (C.R.T.)  terminal. 


With  real  data, power  moments  are  sometimes  used.  For  circular  data 

1  ^  i  i 

x.<— >0.  ,  it  is  even  more  natural  to  use  -  Eexpike.  *  (-Ecoske.  ,  -Esinke.) 

jj  n  j  J  n  J  n  j 

for  integer  k  because  for  any  density  on  the  circle  with  a  Fourier  series 


oo  ,n 

representation  f(0)*E  c_  expime  ,  the  expectation  of  n"AE  expike.;  is 

-»  m  1  J 

c_k  .  This  leads  to  a  non-parametric  density  estimator— see  Watson  (1969). 


For  the  sphere,  spherical  harmonics  may  be  used  in  a  similar  way. 


For  data  Xj,...,xn  on  ,  a  non-parametric  density  estimator  of 
the  kernel  type  may  be  constructed  as  follows.  Let  6n(x;z)  be  a  sequence 
of  probability  densities  on  corresponding  to  probability  distributions 
which  concentrate  on  the  fixed  unit  vector  z  as  n^»  .  For  example  the 
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density 


e(an)expanx'z 


could  be  used  with  an-*®  as  tv*®  .  Then  the  estimator 


fn(z)  *  IT  j“n<V2) 
will  become  unbiased  as  rv>®  since 


E?n(z)  =  /6n(x;z)  f(x)du) 


where  du  is  the  area  element  on  fiq  .  Then 


Ef  n  ( z  )-»-f  ( z )  ,  as  n-*® 


Furthermore 


var  ?n(z)  =  ^  var  6n(X;z) 

*  ^{/6j(x;z)f(z)du  -  ( E?n ( z ) ) 2 >  , 

~  ^  ^  /  S*(x;z)dw  •  as  n-*® 

so  that  E(fn(z)-f(z))2-*  0  provided  that  /6*(x;z)du>  tends  to  infinity 
slower  than  n  . 

With  the  choice  (1),  the  estimator  (2)  Is  easily  programmed.  To  see 
the  result  we  also  need  a  contouring  program.  The  contours  may  be  shown, 
along  with  the  original  data  points,  by  using  the  Lambert  projection  men¬ 
tioned  above.  The  neatest  method  uses  overlaid  transparencies.  The  notion  of  a 
kernel  estimator  seems  to  occur  first  in  Watson  (1970)  but  the  first  Implementa¬ 
tion,  much  as  I  have  described  here,  seems  to  be  In  a  thesis  on  polar  wander¬ 
ing  by  Alstine  (1979). 


I 


In  Lecture  1,  all  these  methods  were  illustrated  on  a  data  set— the 
normals  to  the  orbits  of  all  the  comets  in  the  latest  Smithsonian  Catalogue 
(1979). 

As  will  be  explained  in  the  next  section,  for  uni -modal  distributions 
with  rotational  symmetry,  the  most  commonest  model  is  the  density 

c(k)  exp  k  x'y  (3) 

where  y  is  the  modal  direction  and  tc  is  an  accuracy  parameter.  It  is 
therefore  important  to  have  a  quick  method  to  check  the  fit  of  (3)  to  data. 
For  q=3  ,  it  will  be  shown  below  that: 

4)  is  uniformly  distributed  on  [0,2ir)  (4) 

independently  of 

exp-ic(l-cose)  which  is  approximately  uniformly  (5) 

distributed  oi  [0,1]  . 

It  is  easy  to  check  uniformity— we  may  use  histograms,  (or  their  computer 
form,  stem  &  leaf  plots),  P-P  or  Q-Q  plots  (essentially  the  same  for  uniform 
distributions).  To  carry  this  out  for  (5),  It  Is  necessary  to  use  an 
estimator  of  k  e.g.  the  estimator  (n-l)(n-|R| )“l  may  be  used.  For  q*2, 
(3)  may  be  written 

'x'KCOse  (6) 

where  cos8»x'y  and  IQ(»c)  Is  a  Bessel  function.  Using  the  maximum  likeli¬ 
hood  estimators  of  y  and  <  (see  Section  3),  It  is  easy  to  compute  an 
approximate  P-P  plot. 
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In  Lecture  1,  these  quick  checks  of  (3)  for  q=2,3  were  illustrated  ,] 

on  the  comet  data  and  no  subset  of  comets  was  found  to  fit  this 
distribution. 
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2.  PROBABILITY  DISTRIBUTIONS  ON  flq 

In  the  notation  of  Muller  (1966),  let  dwq  be  the  area  element  on 
flq  =  {x | xeTR q , || x ||  =  1}  ,  |nq|  be  the  area  of  ftq  .  If  are 

orthonormal  vectors  in  IRq  ,  and  xeftq  ,  then 

‘■•V  /rF‘  5,-1 

where  t=x'eq  ,  5q_i  =  unit  vector  in  the  space  spanned  by  £j,...,e  and 
du>q  =  (l-t2)(q"3)/2dt  da)q  l  . 

(The  sDecial  case  du3  =  dcos0d<J>  is  well  known.)  Thus 

u>  =  a)  i  I  (l-t2)(q‘3)/2dt  , 
q  q- 1  _i 

so  that 

u)q  =  2TTq/2  /  r(q/2)  . 

The  characteristic  function,  or  Fourier  transform,  of  any  density 
f(x)  on  nq  is  given  by 

E(expi6"x)  =  /  expie'x  f(x)dw  ,  8dRq 

The  fundamental  distribution  on  Slq  is  the  uniform  distribution  with 
density  |Oq|-1  .  Setting  0*6a  ,  aeOq  ,  a'x=t  ,  we  find  that 

E(explet)  •  saj.  '>1  m 

wq  (0/2 )v 

vrfiere  v*(q-l)/2  and  Jv(e)  is  a  Bessel  function. 


-8- 


1 1 

If  now  Y=a'(Ix.)=t,+. . .+tn  ,  the  characteristic  function  of  Y  is 
*1  j  i  n 

the  n-th  power  of  (7).  Hence  the  density  of  Y  may  be  formally  obtained 

n 

by  inverting  the  n-th  power  of  (7).  The  distribution  of  ]| Zx •  ||  can  then 

1  3  n 

be  deduced  from  the  additional  observation  that  the  direction  of  Ex.  is 

1  J 

uniform. 

While  the  exact  distribution  of  the  length  of  the  sum  of  n  independent 
and  uniformly  distributed  vectors  may  be  deduced  this  way,  it  is  an  analytically 
awkward  result.  However  it  is  easy  to  obtain  an  excellent  approximation  when 
n  is  not  small.  For 


llEXjll2  =  Z?  +...+  Z2  =  ||Z|| 2 

where  Zk  =  sum  of  the  k-th  coordinates  of  x^,...,xn  .  By  the  Central 
Limit  Theorem,  these  sums  become  Gaussian.  For  all  n  ,  the  mean  vector  and 
covariance  matrix  of  Z  are  respectively  0,  nlq/q  .  Hence  the  asymptotic 
distribution  of  qR2/n  is  the  chi-square  distribution  with  q  degrees  of 
freedom.  This  result  for  q=2,3  goes  back  to  Rayleigh  (1880,1919)  and  is 
central  to  the  problem  of  random  flights,  or  as  it  was  first  called  by  Pearson 
(1905),  random  walks.  The  flights  were  of  mosquitoes  and  the  problem  was 
raised  in  this  form  by  Sir  Ronald  Ross'  speculations  of  the  spread  of  fevers. 
Rayleigh  was  concerned  with  the  random  phases  of  sound. 

The  random  walks  not  only  serve  as  a  basic  model  in  many  areas  of 
science  but  have  deep  and  wide  connections  with  mathematics.  The  literature 
on  them  is  vast. 

Non-uniform  distributions  are  interesting  to  statisticians  for  one  of 


several  reasons: 
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(i)  they  fit  data 

(ii)  they  arise  from  stochastic  modelling  and 

(iii)  it  is  easy  to  derive  inference  methods  for  them. 

With  the  advent  of  computers  (iii)  is  less  compelling  than  it  once  was. 
Further  with  the  mathematical  techniques  available  now  for  large  sample 
theory,  (iii)  is  also  less  important. 

Densities  which  have  been  suggested  with  an  eye  on  (i)  and  (iii)  are: 

x,aeQq 

f  *  expica 'x  ,  (8) 

(rotational  symnetry  about  a  single  mode) 

f  «  expic(a'x)2  (9) 

(two  equal  but  opposite  modes  tc>o  ,  a  girdle  distribution  «o  ,  rotational 
symmetry) 


f  «  exp<a'x  +  A(a'x)2  (10) 

(two  unequal  but  opposite  clusters,  rotational  symmetry) 

The  use  of  the  exponential  rather  than  a  more  general  function  is  due  to  (iii) 
for  the  joint  density  of  independent  observations  is  the  product  of  the 
densities— this  will  become  clearer  to  non-statisticians  in  Section  3. 

The  modelling  approach— asking  what  processes  actually  led  to  the  data— 


often  reveals  that  the  directions  are  in  fact  the  directions  of  random 
vectors  whose  lengths  have  been  ignored.  This  is  so  in  paleomagnetism  for 


9 
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example.  Let  X  be  a  random  vector  in  IR q  with  density  f(x)  with  length 
r  so  X=y«,  where  !|£||=1  so  SL  z  .  Then  the  density  of  £  is 
00 

g(£)  =  /  f(r£)rq'1dr  (11) 

0 

For  example  if  X  is  Gaussian  with  mean  vector  y  and  covariance  matrix  l  , 
the  density  g  is  then  called  the  angular  Gaussian.  Special  cases  have 
special  names. 

Work  in  structural  geology  led  the  writer  to  consider  Y=TX  where 
|T|=detT>0  .  Setting  Y=rm  with  me£l  ,  it  follows  from  (8)  that  the  density 
of  m  ,  h(m)  is  defined  by 

h(")  •  FT  (,2) 

The  righthand  side  of  (12)  is  unchanged  if  T-*-kT  so  there  is  no  loss  of 
generality  in  taking  |T|=1  .  The  transformation  Y=TX  is  a  homogeneous 
strain  so  we  see  that  if  only  the  orientations  are  known,  one  may  not  deter¬ 
mine  the  dilatation  [T|  of  the  strain. 

A  most  elegant  formula  is  easily  deduced  from  (12), 

/  UT^mirV.  •  wJdetTj  (13) 

Q  H  H 

q 

There  are  many  more  interesting  formulae  of  this  type.  Again,  we  may  derive 
the  density 

ijifr  •  |T|>0-lEnq 

=  g(-£) 


(14) 
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I  positive  definite,  (14)  becomes 

|^2  (rz_1Ji)q  (15) 

which  has  been  noted  before  by  several  authors  (e.g.  King  (1980). 

The  above  paragraph  has  used  the  marginal  density  of  vectors,  integrating 
out  their  length.  Conditional  distributions  (given  that  their  length  is 
unity)  don't  seem  to  arise  in  modelling  but  are  of  mathematical  interest. 

For  example  if  X  is  Gaussian  mean  p  covariance  o2Iq  ,  the  density  func¬ 
tion  of  X  is  proportional  to 

exp  "  2a7  '  2x  u  + 

so  the  conditional  distribution  of  x  ,  given  ||x||*l  is  proportional  to 

exp  Jrx'u  ,  (16) 

as  was  first  pointed  out  by  Fisher. 

This  density  (16)  has  arisen  above--(8).  For  q=2  it  was  first  sug¬ 
gested  by  vonMises  (1918).  For  q=2,3,  it  was  suggested  and  briefly  studied 
by  Arnold  (1941).  For  q=3  ,  it  was  the  basis  of  Fisher's  (1953)  paper. 

Another  distribution  whose  mathematical  form  is  convenient  for 
statistical  inference  has 

f  «  exp  x'Kx  (  K  a  symmetric  matrix) 


If  we  may  set  T=l"^2 


9(0  =  w’1 


and  was  studied  by  Bingham  (1974).  Density  (9)  was  suggested  by  Watson 
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(1965)  and  Scheidegger  (1965),  density  (10)  appears  in  Fraser  (1979). 

A  fertile  source  of  distributions  is  Brownian  motion.  If  a  particle 
moves  in  TRl  with  independent  steps  dx=?dt  ,  E£=0  ,  E£2=a2  at  time 
steps  dt  ,  the  probability  density  <j>(x,t)  of  its  position  x  at  time  t 
is  given  by 

♦  (x.t)  »  — exp  -  Jtz  (17) 

by  well  known  arguments.  If  the  same  motion  occurs  on  the  circumference  of 
a  circle  of  unit  radius,  the  position  6  of  the  particle  at  time  t  is 
given  by 

oo 

f(e,t)  *  r  4>(0+2frk,t)  ,  (18) 

-oo 

*7^-1  expfm2o2t)expim0  , 

i  00 

*  x-  (l+2lexp(-m2a2t)cosm8)  .  (19) 

1 


Because  of  (18),  f (6 ,t)  ■'s  often  called  the  "rolled-up"  or  "wrapped"  normal 
density.  (19)  may  be  contrasted  with  the  von  Mises  density  which  has  the 
Fourier  series 


1 

SF 


00  V*) 

(1+2Z  -j— | cosme) 


(20) 


Both  (19)  and  (20)  have  modes  at  0*0  and  the  agreement  is  remarkable  if 
the  coefficients  of  cos0  are  identified  i.e. 


* 


I1(<)/I0(<) 


(21) 
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It  may  be  shown  (Hartman and  Watson  (1976)}  that  stopping  the  circular  dif¬ 
fusion  at  a  random  time,  the  density  (20)  may  be  obtained  exactly.  Pitman 
and  Yor  (1980)  have  shown  that  there  is  more  than  one  such  stopping  time 
distribution.  Similar  results  are  true  for  . 

q 

It  is  also  possible  to  obtain  the  densities  (8)  and  (9)  by  other  dif¬ 
fusion  models.  For  example  if  we  consider,  following  Kent  (1976),  a  circular 
diffusion  like  the  previous  one  but  with  a  drift  term  so  that  the  step  in 
dt  is 


de  =  -icsine  dt  +  ZM 

then,  as  t-**>  ,  the  chance  that  the  particle  will  be  found  in  the  interval 
(0,6+de)  tends  to  d6expiccos8  ,  the  von  Mises  density.  If  the  step  is  given 
by 


de  =  -KSin2e  dt  +  Z  vfit  . 

the  limiting  density  is  proportional  to  exp<cos2e  ,  which  is  the  q=2  form 
of  (9).  An  intuitive  proof  of  the  first  of  these  results  follows  from  con¬ 
sidering  a  physical  diffusion  in  a  circular  pipe  of  cross-section  area  A. 

Let  the  concentration  of  particles  at  6  be  f (6)  ,  then  by  Fick's  law  the 
diffusion  forward  across  A  at  6  is  -AD3f/36  ,  where  D  is  a  diffusion 
coefficient.  If  the  medium  has  a  velocity  v(e)  at  e  ,  the  transport 
across  A  Is  Af(9)v(e)  .  At  equilibrium  the  number  of  particles  in  the  pipe 
between  e  and  6+de  must  be  constant.  Hence 
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o  r 

Df  "(vf ) '  =  0  (22) 

i 

Setting  v(0)=-KSin0  and  D=1  ,  the  solution  f  is  seen  to  be  proportional 
to  exp<cos0  . 

This  argument  is  intimately  connected  with  a  well-known  result  in 
statistical  mechanics  (see  e.g.  Joos  (1947))  or.  the  distribution  of  thermally 
agitated  magnetic  dipoles,  moment  m,  in  a  parallel  field  H  .  There  it  is 
found  that  ic=mH/kT  -where  T  is  the  absolute  temperature  and  k  here 
stands  for  Boltzmann's  constant. 

In  the  discussion  of  Kendall  (1974),  Reuter  suggested  another  diffusion 
model  for  (8).  Let  particles  be  steadily  released  from  the  origin  of  the 
sphere  ||x||=l  and  record  where  they  first  hit  .  The  distribution  of 
first  hits  will  be  uniform  without  any  drift  but  if  the  drift  is  constant  it 
will  have  the  density  (8).  To  get  an  intuitive  proof  of  this  result,  let  f 
be  the  equilibrium  concentration  of  particles  at  any  point  in  or  on  the  sphere 
when  they  are  steadily  produced  at  the  rate  of  1  per  unit  time  at  the  origin. 
Let  them  diffuse  (but  not  interact  with  each  other)  in  a  medium  that  moves 
with  an  arbitrary  velocity  v(x)  .  The  answer  we  seek  is  3f/3nL  ,  the 
normal  derivative  of  f  on  ||x|(=1  .  If  V  is  any  small  volume  within  the 
sphere  with  boundary  3V  ,  the  loss  of  particles  from  V  due  to  transport  is 

/  n*(vf)dS  ^  | V | d i v ( vf ) 

3V 

where  n  Is  an  outward  normal  and  dS  an  area  element  on  3V  .  The  gain 
due  to  diffusion  is 

/fir |v|v!f 

3V  an 
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If  V  is  a  vanishingly  small  volume  that  does  not  include  the  origin  and 
equilibrium  is  attained,  the  concentration  f  satisfies 

2f  -  div  v  f  =  0  ,  ||x||*0.1  (23) 

f  -  0  ,  llxl|*l 

This  is,  of  course,  a  generalization  of  (22).  To  ensure  a  suitable  source 
of  particles  at  the  origin  we  must  demand  that 

^  <*2)  • 

■*  2“-  log  p  (q=2)  , 

as  r=  |1  x  ||-K)  . 

For  the  special  case,  q=2  ,  vx*c  ,  vy=0  ,  we  may  try  f=exp(kx)  g(x,y) 
in  (23).  It  is  readily  seen  that  if  we  choose  k*c/2  ,  g  must  satisfy 

2g  «  c2  g  , 

T 

g  *  0  ,  r2=x2+y2=l  , 

9  *  log  r  »  r*° 

But  (24)  implies  that  g  is  a  function  only  of  r  .  Hence 
f(x,y)  *  g(r)  exp  |  rcose  , 

so  that  3f/3r  on  the  boundary  r*l  is  proportional  to  exp(c/2  cose)  . 
Inspection  shows  that  the  proof  trivially  extends  to  any  number  of  q  dimen¬ 
sions.  Finally  by  choosing  other  velocity  fields,  other  distributions  on 


the  sphere  may  be  obtained.  This  is  technically  difficult  unless  the  velocity 
fields  are  generated  by  a  potential  as  in  classical  hydrodynamics. 

Lecture  2  concluded  with  visual  comparisons  of  many  distributions  on 


the  circle  and  sphere. 
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3.  INFERENCE  PROBLEMS  ON  ftq 
3.1  Introduction 

Testing  whether  a  random  sample  of  directions  Xp...,x  has  been 
drawn  from  the  uniform  distribution  is  possibly  the  oldest  significance  test 
problem.  Bernoulli  (1734)  considered  whether  the  orientations  of  the 
planetary  orbits  were  random.  Watson  (1970)  reconsidered  the  problem  with 
modern  data  and  used  the  normals  to  the  orbital  planes,  directed  by  the 
righthand  rule.  An  intuitive  and  approximate  test,  using  the  length  of  the 
vector  sum  7.13  of  the  9  normals,  may  be  based  on  Rayleigh's  result. 

Naturally  the  null  hypothesis  is  strongly  rejected.  By  using  Neyman-Pearson 
theory,  elegant  and  sensitive  methods  may  be  tailor-made  when  one  has  certain 
alternatives  in  mind.  These  will  be  discussed  in  Section  3.4. 

Much  of  the  literature  is  concerned  with  estimation  and  testing  prob¬ 
lems  when  the  data  are  assumed  to  be  drawn  from  the  von  Mises-Arnold-Fisher 
density  (8).  This  will  be  briefly  outlined  in  Section  3.2.  Similar  theory 
and  methods  have  been  developed  for  other  specific  densities  but  space  does 
not  permit  mentioning  them. 

One  is  rarely  certain  that  a  specific  distributional  form  obtains  so 
all  methods  of  analysis  should  not  be  too  sensitive  i.e.,  they  should  be 
robust.  One  may  check  the  behavior,  by  computer  simulation,  of  the  methods 
mentioned  In  the  previous  paragraph  to  see  how  resistant  they  are  to  changes 
in  distributional  form  and  outliers  in  the  data.  Or  better,  try  to  design 
new  methods  of  analysis  that  will  be  robust.  Another  method  is  to  make 
few  distributional  assumptions  but  to  assume  that  the  samples  are  large 
enough  to  permit  fairly  good  approximations  to  be  made--essentially  that 
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a  Central  Limit  Theorem  effect  will  operate.  This  is  the  topic  of  Section  3.3 


3.2  Inference  for  the  von-Mises-Arnold-Fisher  distribution 
The  flavor  of  this  topic  is  best  seen  by  discussing  only  the  case  q=3  . 
Then  the  density  on  the  sphere  is 

,(x)  *  JFSTShT  8X0  x  x'a  •  xi°  <25> 

Clearly  when  r=0  ,  this  becomes  the  uniform  density,  (4n)"'  .  As  tc— .  , 
all  the  probability  concentrates  about  the  mode  a  .  The  distribution  is 
always  rotational ly  symmetrical  about  a  .  Of  course  ||a||=l  . 

If  a  sample  is  drawn  from  (25),  the  logarithm  of  the  likeli¬ 

hood  of  the  data  is,  to  a  constant, 

n  log(K/sinh<)  +  <  (Zx . ) ^a  (26) 

This  is  maximized  by  choosinq  a  parallel  to  RfZx^  .  Hence  the  maximum 
likelihood  estimator  a  of  a  is  the  direction  of  R*Exi  ,  the  intuitive 
estimator  we  met  in  Section  1.  Thus  a*R/R  ,  where  R*||Ex^||  .  The  value  of 
k  which  then  maximizes  (26)  with  a  instead  of  a  ,  tc  say,  satisfies 

coth  ic  -  i  *  £  .  (27) 

If  ic>3  and  n  is  not  too  small,  K*k  where 

k  *  &  (28) 

the  Intuitive  accuracy  estimator  we  met  in  Section  1.  If  a  were  known,  and 
we  write  X*a'R  ,  the  estimator  of  k  would  be  fairly  accurately 
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It  Is  of  course  trivial  to  compute  the  exact  maximum  likelihood  (m.£.) 
estimators. 

To  find  a  confidence  cone  for  the  true  modal  direction  a  ,  we  consider 
the  problem  of  testing  the  null  hypothesis  Hg  that  the  true  modal  direction 
is  a  .  This  is  how  the  analogous  problem  for  the  linear  normal  distribution 
is  solved.  There  one  can  consider  the  analysis  of  variance  identity 

Z(Xj-y0)2  -  I(xj-x)2  +  n(x-yQ)2  (30) 

along  with  the  distributional  identity 

°2*n  =  ^n-l  +  °2*2i  (31) 

where  the  terms  on  the  righthand  side  of  (31)  are  independent.  To  check 
where  x  is  too  far  from  yQ  for  the  null  hypothesis  to  be  likely,  we 
notice  that  the  ratio 


n(x-uQ)2  _  x? 

=  ^ 


(32) 


wnose  distribution  is  free  of  unknown  parameters.  In  fact,  if  one  multiplies 

both  sides  of  (32)  by  (n-1),  the  lefthand  side  is  t2  and  the  righthand  side 

is  distributed  as  F,  „  ,  . 

l  ,n- 1 

Returning  to  our  problem,  the  analogue  of  (30)  is 


n  -  X  =  n-R  +  R-x 


(33) 


because  we  saw  in  Section  1  that  n-X  and  n-R  were  measures  of  dispersion 
of  the  data  about  the  true  and  estimated  modes.  In  the  next  paragraph  we 


will  see  that  we  may  assert,  if  k  is  not  small,  that  approximately 

2<(n-X)  -  x|n  .  2x(n-R)  -  .  2k(R-X)  *  (34) 

and  that  the  last  two  expressions  are  independent.  Hence  we  have  the 
analogue  of  (32) 


R-X  _ 


x2(n-l ) 


(35) 


which,  if  multiplied  by  (n-1)  leads  to  the  Fg  2(n-l)  distribution. 
Large  values  of  (32)  and  (35)  both  lead  us  to  reject  the  null  hypothesis. 
Since  X=Rcose  where  0  is  the  angle  between  the  true  and  sample  modal 
directions,  the  lefthand  side  of  (33)  is 


R(l-cos0)/(N-R) 

so  that  the  semi-angle  095  of  a  95%  confidence  cone  is  defined,  to  good 
accuracy, by  the  equation 

(n-1 ) (1 -COS0)  R  _  r  fqc#! 

N  ^R  F2,2(n-1 )  t36) 

To  justify  (34),  we  note  that  the  Fisher  density  (25),  put  in 
spherical  polar  angles  with  x'ascos0  ,  yields  a  joint  density  of  6,4> 


k  exp(<cos0)  sin0  , 
4irslnhK 

■  k  e-K(1-cos9)s1ne  •  jL 


If  k  is  large.  Thus  <p  is  uniformly  distributed  on  [0,2ir)  independently 
of  0  and  w=<(l-cos0)  has  a  standard  exponential  distribution.  Thus 


2w*xl  and  exp(-uj)  has  a  uniform  distribution  on  [0,1]  .  These  results 
were  needed  in  Section  1  to  justify  a  rough  goodness  of  fit  procedure.  The 
assertions  (34)  now  become  good  guesses.  In  fact  one  must  verify  all  this 
intuitive  approach  by  mundane  methods.  But  if  one  understands  it  one  can 
guess  how  much  more  complicated  problems  should  and  can  be  solved  e.g.  in  the 
lectures  this  was  illustrated  with  two  sample  problems.  This  adds 
insights  instead  of  merely  mechanically  applying  classical  inference  methods 
to  derive  procedures. 

Other  fully  specified  distributions  may  be  more  appropriate  than  (25) 
in  which  case  similar  techniques  may  be  used. 

Mardia  (1972)  is  primarily  a  summary  of  this  aspect  of  our  subject, 
for  q=2,3  . 

3.3  Robust  methods 

One  is  rarely  confident  that  one's  data  is  a  sample  from  a  specific 
distribution  so  it  is  ridiculous  to  use  an  analysis  which  is  sensitive  to 
small  changes  in  the  parental  distributional  form.  Any  procedure  (e.g.  that 
in  (36))  can  be  checked  by  computer  simulation.  One  simply  invents  a  distribu 
tion,  draws  random  samples  from  it,  computes  the  statistic  again  and  again  and 
compares  the  observed  with  theoretical  proportions.  I  found  e.g.  that  the 
test  procedure  implied  by  (36)  was  very  robust  but  that  related  tests  for 
comparing  k's  were  not— see  Watson  (1967). 

Instead  of  checking  whether  a  procedure  is  robust,  it  perhaps  better  to 
set  out  to  design  procedures  which  will  be  robust.  This  is  an  active  research 
area  nowadays  in  which  Huber,  Hampel  and  Tukey  have  provided  key  ideas. 
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The  most  studied  problem  is:  how  to  make  sound  inferences  about  the  center 
of  a  symmetric  density  on  the  line.  It  is  assumed  that  the  density  behaves, 
around  the  center,  like  the  Gaussian  but  may  further  out  have  much  heavier 
tails. 

The  analogous  problem  on  Q3  would  be  to  assume  a  rotational ly 
symmetric  distribution 


f  =  c(ic;f0)f0(<  a"x) 


(37) 


If  f  is  exp  this  (25). 
that  the  estimator  of  a  , 

n  f'(ie  S'x.) 
j=l  V*  *>xj} 


If  the  m.i,,  method  is  applied  to  (37)  we  find 
&  ,  must  be  parallel  to 

x.  ,  (38) 


in  contrast  to  Zxi  for  the  Fisher  distribution.  The  trick  now  is  to  design 

this  "weighted"  sum  so  that  it  is  not  too  sensitive  to  very  aberrant  vectors. 

This  is  one  of  my  current  research  projects. 

If  large  samples  are  available,  it  is  intuitively  clear  that  we  should 

be  able  to  learn  from  the  data  itself  something  about  the  shape  of  the 

distribution  sampled  and  so  to  design  methods  that  will  be  effective  whatever 

the  true  distribution  is.  No  such  "adaptive"  methods  now  exist. 

-1/2  n 

For  large  n  ,  n  '  lx.*  will  have  approximately  a  q  dimensional 

1  J 

Gaussian  distribution  with  a  mean  y  and  covariance  £  defined  by 


y 


*  /  x  f  du 


I  *  /  (x-y)(x-y)'fdw 


(39) 


The  approach  was  first  use  i  Sengupta  and  J.S.  Rao  (1966)  and  in  the  latter 
thesis  (Rao,  1969),  compared  with  my  analysis  of  variance  (explained  briefly 


-23- 


in  the  last  section)  when  f  is  the  von-Mises  distribution.  However  no 
assumptions  about  f  need  to  be  made  and  no  effort  was  made  to  estimate  f 
or  to  "tailor"  the  analysis  to  be  particularly  effective  when  f  is  "near" 
the  exptcx'a  form. 

Wellner  (1980),  in  an  unpublished  paper,  took  a  similar  approach  but 
assumed  that  the  data  came  from  an  axially  symmetric  distribution  on 
i.e.  that 


f  =  f(a'x)  .  (40) 

For  general  q  ,  it  is  easy  to  prove  that,  instead  of  (39), 

y  3  pa  ,  | p| <1  | 

\  (41) 

Z  =  a2aa"  +  02(Iq-aa")  , J 

where  a  and  8  are  simple  functions  of  E(a'x)  ,  E(a'x)2  .  The  procedures 

which  result  will  be  more  efficient  than  those  of  Rao  if  the  rotational 
symmetry  is  true. 

In  fact  the  commonest  deviation  from  the  von-Mises,  Arnold,  Fisher 
distribution  is  lack  of  rotational  symmetry  but,  except  for  my  simulation 
studies,  there  is  no  published  work  on  this  problem.  Oval  shaped  clusters  of 
data  on  the  sphere  in  paleomagnetism  are  often  attributed  to  a  mixture 

of  distributions  with  different  modal  vectors.  Again,  no  studies  have  been 

made  of  the  effect  of  lack  of  independence  on  estimators  of  modal  directions— 
this  can  have  a  disastrous  effect  in  linear  problems  and  may  be  expected  to 
do  the  same  on  the  sphere. 
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3.4  Testing  whether  f  is  uniform 

If  on  the  null  hypothesis  Hq  ,  f^"1  and  on  the  alternative  Hj ,  f=fj  , 

then  the  fteyman- Pearson  Lemma  tells  us  to  reject  Hg  in  favor  of  if 
n 

n  fx(x.)  is  too  large.  Suppose  fj  =  f (0  x)  where  0  is  a  rotation 
j=l  J  q  q 

matrix.  If  0q  is  unknown,  it  is  reasonable  to  demand  a  test  statistic 

which  does  not  depend  upon  what  0q  is  i.e.  to  demand  an  invariant  test. 

This  leads  to  the  test  statistic 

I  n  f(0  xJdO.  =  ave  nf(0_x,)  (42) 

j=i  q  J  q  oq  q  J 

The  statistic  (42)  may  be  trivially  evaluated  when  q=2  because  it 
will  naturally  be  written 
2tt  n 

/  n  f (0 -+(f))d<J)  (43) 

0  1  J 

In  the  particular  case  where  f(9+4>)  =  exptccos (e+d> )/2tt  I0(<)  ,  (43)  is 
proportional  to  I0(kR)  which  (since  k>0)  increases  monotonically  with  R 
(the  length  of  the  sum  of  the  data  vectors).  Thus  we  can  assert  that  in  this 
case  the  so-called  Rayleigh  test--reject  uniformity  if  R  is  too  large— is 
the  best  invariant  test.  This  test  makes  intuitive  sense  whenever  the 
alternative  is  uni -modal. 

If  the  alternative  density  is  very  far  from  uniformity  it  should  be 
easy  to  design  a  sensitive  test  when  one  has  enough  data.  But  if  the  alterna¬ 
tive  density  is  close  to  uniformity,  more  care  is  obviously  required.  Beran 
(1968)  gave  an  elegant  theory  for  testing  for  uniformity  on  compact  homogeneous 
spaces.  To  sketch  this  in  our  setting,  we  suppose  a  sequence  of  alternatives 
to  uniformity  defined  by 
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^U)  =  Wq1  +  tc{f(Oqx)  -  w"1}  ,  k-0  .  (44) 

The  integral  of  fK(x)  over  nq  will  be  unity  and  fK(x)  will  be  non¬ 
negative  for  k  small  enough  if  f(x)  is  bounded  on  .  Of  course, 
fK(x )  -*•  oi”1  ,  the  uniform  density  as  <-*0  . 

Setting  X=kw  ,  Ave  Ilf  (Ox.)  is  proportional  to 

M  i  J 

Ave  n  (1  +  X(o)  f  (0  x  .)  -  1}  ) 

H  H  J 

=  Ave  [1  +  X  E  (u)a  f  (0Qx.)-l)  +  X2  EEkffOx.l-DMCOx.-l)] 
j  q  q  J  j*k  q  q  J  q  q  k 

plus  smaller  terms.  The  average  of  the  coefficient  of  X  is  zero  and  the 
coefficient  of  X2  may  be  simplified  by  the  identity  EEa .a.=(Ea .  )2-  a?  . 

J  K  J  J 

Noting  that 


Ave  (a)qf(°qxj)~  )2  independent  of  Xj  , 

Beran  thus  shows  that  the  best  invariant  test  for  this  sequence  of  local 
alternative  hypotheses  is  based  on  large  values  of 

Ave  [E  Coj  f (0  x. )-l }  ]2  (45) 

0  j  q  q  J 

q  J 

To  complete  the  test  we  need  the  distribution  of  (45)  when  the  data 
actually  come  from  a  uniform  distribution.  This  is  naturally  calculated  by 
Fourier  methods.  If  we  take  a  circle  of  unit  perimeter  and  set 
f (9 )  »  cmexpi2inne  ,  it  is  easily  shown  that 


(46) 


E{-F(e  -+4>)-l }  =  I  c_expi2ir4>m  I  expi2Trm0.  , 

3  m^O  m  j  3 


1  » 

/  [E{f{8.+<J>)-1}]2d<J>  -  2  Z  |c_| 2  |Eexpi2-rrme .  | 2  .  (47) 

0  J  1  m  j  J 


Now  it  may  be  shown  that,  as  n-»°  , 


expi2irme.  |2 

J 


Xi 


,  m- 1,2,..., 


(48) 


and  that  these  random  variables  are  independent.  Hence  the  asymptotic 
distribution  of 


[Z{f(ej+<)))-l>]2d* 


(49) 


is  that  of 


?ISJ2  x? 


(50) 


The  statistic  (47)  has  an  intuitive  interpretation.  We  will  not  pause  to 
give  this  or  to  explain  how  the  distribution  of  (50)  may  be  obtained.  The 
U2  statistic  of  Watson  (1961)  is  a  special  case  of  (50)  and  is  a  circular 
variant  of  the  Cram4r-von  Mises  statistic. 

Beran's  work  was  motivated  by  Ajne  (1968)  who  defined  special  sequences 
of  local  and  distant  alternatives, and  Watson's  use  of  Fourier  methods.  It 
will  be  noted  that  to  get  statistics  of  the  Kolmogorov  type,  which  use  the 
supremum,  it  Is  necessary  to  use  distant  alternatives.  This  was  explored 
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further  in  Watson  (1974),  in  the  E.J.G.  Pitman  Festschrift. 

An  important  point  to  notice  here  is  that  sample  distribution  func¬ 
tions  will  not  usually  arise--they  are  natural  only  on  the  line.  Further, 
if  the  circle  is  a  guide,  supremum  type  tests  can  only  be  justified  on 
rather  absurd  grounds.  However  their  mathematical  interest  has  led  to  an 
enormous  literature. 

The  topic  of  testing  uniformity  is  one  of  great  mathematical  interest 
since  it  may  be  treated  in  greater  generality  than  many  statistical  problems. 
The  papers  of  Gine  (1975),  Wellner  (1979),  Prentice  (1978)  give  a  flavor  of 
this  work. 
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4.  CONCLUDING  REMARKS 

In  these  three  lectures,  I  have  tried  to  illustrate,  by  reference  to 
directional  data,  the  three  sides  of  statistics: 

•  Data  Analysis  -  especially  the  use  of  computers  and  graphics, 

•  Modelling  -  use  of  stochastic  processes, 

•  Inference  -  Estimation  and  Testing,  Robustness. 

I  have  made  no  effort  to  cover  the  entire  spectrum  of  problems,  solved  and 
unsolved,  that  arise” with  directional  data.  Many  topics  which  I  consider 
to  be  important  and  interesting  have  been  ignored. 

Further  we  have  seen  that  linear  and  spherical  data  must  be  treated 
differently— the  mathematical  structure  of  the  sample  and  parameter  spaces 
plays  an  essential  role.  For  data  in  IRq  ,  we  have  well  defined  notions  of 
mean,  variance,  covariance,  correlation.  These  quantities  are  still  not 
satisfactorily  defined  on  .  Again  for  parameters  in  lRq  we  have 
(modulo  some  technical  arguments)  firm  ideas  about  what  we  mean  by  a  "qood" 
or  "best"  estimator.  On  ,  we  have  NONE! 

We  have  seen  how  Pitman's  early  work  on  location  and  scale  parameters, 
and  invariance  can  be  greatly  extended  to  data  in  homogeneous  spaces.  A 
greatly  extended  general  theory  of  statistics  could  be  developed.  The  very 
simple  practical  problems  with  unit  vectors  with  which  we  began  have  led  us 
into  unknown  territory. 

Finally  I  would  like  to  thank  Prof.  D.R.  McNeil  for  bringing  me  back 
to  my  native  land  and  the  Australian  Mathematical  Society  for  the  Invitation 
to  speak  to  their  21st  Summer  Research  Institute. 
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